Matrix Multiplication Specialization in STAPL

نویسندگان

Adam Fidel

Lena Olson

Antal Buss

Timmie Smith

Gabriel Tanase

Nathan Thomas

Mauro Bianco

Nancy M. Amato

Lawrence Rauchwerger

چکیده

The Standard Template Adaptive Parallel Library (STAPL) is a superset of C++’s Standard Template Library (STL) which allows highproductivity parallel programming in both distributed and shared memory environments. This framework provides parallel equivalents of STL containers and algorithms enabling ease of development for parallel systems. In this paper, we will discuss our methodology for implementing a fast and efficient matrix multiplication algorithm in STAPL. Our implementation employs external linear algebra libraries, specifically the Basic Linear Algebra Subprograms (BLAS) library which includes highly optimized sequential matrix operations. The paper will describe the benefits of creating a parallel matrix multiplication algorithm whose library calls are specialized based on both the matrix storage and traversal. This specialization technique ensures that the most appropriate implementation in terms of data access and structure will be used, resulting in increased efficiency compared to a non-specialized approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization of Sparse Matrix-Vector Multiplication by Specialization

Program specialization is the process of generating optimized programs based on available inputs. It is particularly applicable when some input data are used repeatedly while other input data vary. Specialization can be employed at compile-time as well as at run-time, depending on when the inputs become available. In this paper we explore the potential for obtaining speed-ups for sparse matrix-...

متن کامل

Optimization by Run-time Specialization for Sparse Matrix-Vector Multiplication (Submitted for publication)

Run-time specialization is the process of generating programs based on information available only at run time. This technique has the potential of generating highly efficient codes, at the expense of the overheads of the run-time code generation. It is applicable when some input data is used repeatedly while other input data varies. In this paper we explore the potential for obtaining speedups ...

متن کامل

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Bacon: A GPU Programming Language With Just in Time Specialization (Draft)

This paper describes Bacon, a data-parallel programming system targeting OpenCL-compatible graphics processors. This system is built upon the existing OpenCL standard in order to make it easier for programmers to write high performance kernels for GPU accelerated applications. The OpenCL C syntax is extended into a new language, Bacon C, intended to make development significantly more convenien...

متن کامل